Joseph Wright's English Dialect Dictionary (1898-1905) Computerised: architecture and retrieval routine
نویسنده
چکیده
Comment on Chapter 0: By way of an introduction on this conference of relatively heterogeneous participants – philologists and computer scientists being the two main groups – I would like to emphasise the difficulty of finding a common language in the interdisciplinary dialogue. While English is, of course, the lingua franca amongst the participants, there is still a risk of nonand misunderstanding. As I see it, computer philologists tend to produce theoretical concepts of how problems can be solved with the help of software, but there is a limited competence in identifiying the problems. Philologists, on the other hand, with a good knowledge of the textual, historical and cultural embeddedness of their objects of research, often know the problems at issue, but may not be aware of the new computer-assisted tools and strategies waiting for them to solve their problems. Given this basic difference of competence, we should all try to „meet halfway“, i.e. to cooperate. The first step towards cooperation is to clarify one’s own position. I, therefore welcome the invitation of the organisers of the Dagstuhl conference to present a short selfintroduction. Most corpus-linguists either compile a corpus or use corpora compiled by others. My own academic past has caused me to do both, compiling and analysing. Details of my corpuscompiling activity can be seen from the survey given above. Wright’s English Dialect Dictionary is the last in a series of projects of corpus-compilation. As regards the application of corpora, it seems to me that both the historicity and complexity of corpus texts are often underestimated. The larger and the older the texts of a corpus are, the more are they likely to be based on variable and erstwhile norms which disallow computer accessibility of the texts in their original shape. Editorial practices (if editions are used for compilation) and the now unimaginable irregularities of spelling in at least medieval and Early Modern English texts are the two main arguments for the necessity of normalisation. If a word comes along in shapes that are incalculable at hoc, the results of query routines are bound to be misleading. For more detailed arguments along these lines see my eralier publications Markus 1997 and 2000.
منابع مشابه
The effect of three vocabulary techniques on the Iranian ESP learners’ vocabulary production
The present study aimed to examine the effect of three vocabulary techniques (dictionary use, etymological analysis, and glossing) on the Iranian ESP learners' vocabulary production. Forty-five university students majoring in architecture at Azad University, Anzali branch, participated in this study. They were divided into three groups, and each group was randomly assigned to one kind of treat...
متن کاملElhuyar-IXA: Semantic Relatedness and Cross-lingual Passage Retrieval
This article describes the participation of the joint Elhuyar-IXA group in the ResPubliQA exercise at QA&CLEF. In particular, we participated in the English–English monolingual task and in the Basque–English crosslingual one. Our focus has been threefold: (1) to check to what extent information retrieval (IR) can achieve good results in passage retrieval without question analysis and answer val...
متن کاملAnalyzing and Interpreting Automatically Learned Rules Across Dialects
In this paper, we demonstrate how informative dialect recognition systems such as acoustic pronunciation model (APM) help speech scientists locate and analyze phonetic rules efficiently. In particular, we analyze dialect-specific characteristics automatically learned from APM across two American English dialects. We show that unsupervised rule retrieval performs similarly to supervised retrieva...
متن کاملPreparation of MaDiTS corpus for Malay dialect translation and speech synthesis system
This paper presents our work in acquiring a Malay dialect translation and speech synthesis corpus. In this study, an architecture of speech corpus acquisition, which including Malay dialect translation and Malay dialect grapheme to phoneme (G2P), was proposed. The pronunciation dictionary for dialectal Malay was generated through G2P tool. As dialectal Malay is considered as scarce resource, di...
متن کاملPhrasal Translation for English-Chinese Cross Language Information Retrieval
This paper introduces a simple and effective nonoverlapping unigram and bigram segmentation method for both monolingual Chinese and English-Chinese cross language retrieval. It also describes English-Chinese cross language retrieval experiments involving 54 topics and some 164,000 documents. The translation of English queries to Chinese is done using a Chinese-English dictionary of about 120,00...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006